Search CORE

362 research outputs found

Visual Data Mining

Author: C Chen
DA Keim
DA Keim
F Oliveira de
JJ Thomas
L Chittaro
T Soukup
UK Demšar
Publication venue: ScholarWorks@CWU
Publication date: 01/01/2017
Field of study

Occlusion is one of the major problems for interactive visual knowledge discovery and data mining in the process of finding patterns in multidimensional data.This project proposes a hybrid method that combines visual and analytical means to deal with occlusion in visual knowledge discovery called as GLC-S which uses visualization of n-D data in 2D in a set of Shifted Paired Coordinates (SPC). A set of Shifted Paired Coordinates for n-D data consists of n/2 pairs of common Cartesian coordinates that are shifted relative to each other to avoid their overlap. Each n-D point A is represented as a directed graph A* in SPC, where each node is the 2D projection of A in a respective pair of the Cartesian coordinates. The proposed GLC-S method significantly decrease cognitive load for analysis of n-D data and simplify pattern discovery in n-D data. The GLC-S method iteratively splits n-D data into non-overlapping clusters (hyper-rectangles) around local centers and visualizes only data within these clusters at each iteration. The requirements for these clusters are to contain cases of only one class and be the largest cluster with this property in SPC visualization. Such sequential splitting allows: (1) avoiding occlusion, (2) finding visually local classification patterns, rules, and (3) combine local sub-rules to a global rule that classifies all given data of two or more classes. The computational experiment with Wisconsin Breast Cancer data(9-D), User Knowledge Modeling data(6-D), and Letter Recognition data(17-D) from UCI Machine Learning Repository confirm this capability. At each iteration, these data have been split into training (70%) and validation (30%) data. It required 3 iterations in Wisconsin Breast Cancer data, 4 iterations in User Knowledge Modeling and 5 iterations in Letter Recognition data and respectively 3, 4, 5 local sub-rules that covered over 95% of all n-D data points with 100% accuracy at both training and validation experiments. After each iteration, the data that were used in this iteration are removed and remaining data are used in the next iteration. This removal process helps to decrease occlusion too. The GLC-S algorithm refuses to classify remaining cases that are not covered by these rules, i.e.,., do not belong to found hyper-rectangles. The interactive visualization process in SPC allows adjusting the sides of the hyper-rectangles to maximize the size of the hyper-rectangle without its overlap with the hyper-rectangles of the opposing classes. The GLC-S method splits data using the fixed split of n coordinates to pairs. This hybrid visual and analytical approach avoids throwing all data of several classes into a visualization plot that typically ends up in a messy highly occluded picture that hides useful patterns. This approach allows revealing these hidden patterns. The visualization process in SPC is reversible (lossless). i.e.,., all n-D information is visualized in 2D and can be restored from 2D visualization for each n-D case. This hybrid visual analytics method allowed classifying n-D data in a way that can be communicated to the user’s in the understandable and visual form

Crossref

ScholarWorks at Central Washington University

Recommended from our members

On the challenges and opportunities in visualization for machine learning and knowledge extraction: A research agenda

Author: A Holzinger
A Holzinger
A Holzinger
D Sacha
DA Keim
DA Keim
DA Keim
H Kim
H Mueller
J Cohen
J Kehrer
JB Tenenbaum
JS Yi
M Hund
M Ward
R Kosara
RN Shepard
S Marsland
ST Roweis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

We describe a selection of challenges at the intersection of machine learning and data visualization and outline a subjective research agenda based on professional and personal experience. The unprecedented increase in the amount, variety and the value of data has been significantly transforming the way that scientific research is carried out and businesses operate. Within data science, which has emerged as a practice to enable this data-intensive innovation by gathering together and advancing the knowledge from fields such as statistics, machine learning, knowledge extraction, data management, and visualization, visualization plays a unique and maybe the ultimate role as an approach to facilitate the human and computer cooperation, and to particularly enable the analysis of diverse and heterogeneous data using complex computational methods where algorithmic results are challenging to interpret and operationalize. Whilst algorithm development is surely at the center of the whole pipeline in disciplines such as Machine Learning and Knowledge Discovery, it is visualization which ultimately makes the results accessible to the end user. Visualization thus can be seen as a mapping from arbitrarily high-dimensional abstract spaces to the lower dimensions and plays a central and critical role in interacting with machine learning algorithms, and particularly in interactive machine learning (iML) with including the human-in-the-loop. The central goal of the CD-MAKE VIS workshop is to spark discussions at this intersection of visualization, machine learning and knowledge discovery and bring together experts from these disciplines. This paper discusses a perspective on the challenges and opportunities in this integration of these discipline and presents a number of directions and strategies for further research

City Research Online

Crossref

Exploratory topic modeling with distributional semantics

Author: A Treisman
DA Keim
DM Blei
J Risch
L Barth
M Bostock
S Fortunato
S Lohmann
S Palmer
Y Bengio
Publication venue
Publication date: 16/07/2015
Field of study

As we continue to collect and store textual data in a multitude of domains, we are regularly confronted with material whose largely unknown thematic structure we want to uncover. With unsupervised, exploratory analysis, no prior knowledge about the content is required and highly open-ended tasks can be supported. In the past few years, probabilistic topic modeling has emerged as a popular approach to this problem. Nevertheless, the representation of the latent topics as aggregations of semi-coherent terms limits their interpretability and level of detail. This paper presents an alternative approach to topic modeling that maps topics as a network for exploration, based on distributional semantics using learned word vectors. From the granular level of terms and their semantic similarity relations global topic structures emerge as clustered regions and gradients of concepts. Moreover, the paper discusses the visual interactive representation of the topic map, which plays an important role in supporting its exploration.Comment: Conference: The Fourteenth International Symposium on Intelligent Data Analysis (IDA 2015

arXiv.org e-Print Archive

Crossref

Influence of vaccine-preventable diseases and HIV infection on demand for an infectious diseases service in Rio de Janeiro State, Brazil, over 22 years – Part II (1995-2016)

Author: Ferreira Laura da Cunha
Keim Luiz Sérgio
Oliveira Solange Artimos de
Setúbal Sérgio
Publication venue: Universidade de São Paulo. Instituto de Medicina Tropical de São Paulo
Publication date: 01/01/2019
Field of study

Patients’ data during daily clinical care are extremely important for improving the allocation of healthcare resources and for assessing healthcare demands. The prospective gathering of these data over decades allowed us to describe the trends of infectious diseases in a tertiary hospital. The results concerning the period between 1965 and 1994 described the exponential increase in the incidence of HIV infection and its important effects on our institutional mortality. The present study describes the demand for the same hospital between 1995 and 2016. There were 4,691 admissions and the main causes of admissions were, in descending order, HIV infection (1,312, 28.0%), noninfectious diseases (447, 9.5%), meningoencephalitis (432, 9.2%), soft tissue infections (427; 9.1%), tuberculosis (272, 5.8%), pneumonias (212, 4.5%) and leptospirosis (212, 4.5%). There were 864 readmissions; most due to HIV infections (65.2%). The institutional mortality fell from 16.9% in the first two years to 5.0% in the last two years of the study. The case-fatality rates among the HIV patients decreased from more than 40% to approximately 5% over the study period. In the last two decades, the hospital experienced a decrease in demand due to vaccine-preventable diseases. The demand for children has fallen and the demand for patients over the age of 50 has increased. These results reflect the improvement in public health standards over more than half a century and the positive effects of the National Immunization Program. They also illustrate the sharp decline in the HIV case-fatality rate after the introduction of combined antiretroviral therap

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Cadernos Espinosanos (E-Journal)

Visual Analytics for Network Security and Critical Infrastructures

Author: A Endert
A Kott
D Sacha
DA Keim
G Sun
J Gao
MR Endsley
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

A comprehensive analysis of cyber attacks is important for better understanding of their nature and their origin. Providing a sufficient insight into such a vast amount of diverse (and sometimes seemingly unrelated) data is a task that is suitable neither for humans nor for fully automated algorithms alone. Not only a combination of the two approaches but also a continuous reasoning process that is capable of generating a sufficient knowledge base is indispensable for a better understanding of the events. Our research is focused on designing new exploratory methods and interactive visualizations in the context of network security. The knowledge generation loop is important for its ability to help analysts to refine the nature of the processes that continuously occur and to offer them a better insight into the network security related events. In this paper, we formulate the research questions that relate to the proposed solution

Crossref

Univerzitní repozitář Masarykovy univerzity

Criterion-Based Grading, Agile Goal Setting, and Course (Un)Completion Strategies

Author: C Douce
DA Keim
G Johnson
GM Seddon
J Hattie
J Spacco
Janet Carter
K Illeris
R Lister
S Merriam
Ursula Fuller
Publication venue: Springer
Publication date: 01/01/2019
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

The four faces of information visualization: A conceptual framework for a postgraduate program

Author: alexandrino da silva
bonsiepe
bradshaw
cairo
costa
davenport
few
few
gray
hall
keim
kirk
lorenz
machado
meirelles
norman
schuller
shaw
stray
tukey
wildbur
wright
Publication venue: IEEE
Publication date: 01/01/2017
Field of study

The multidisciplinary nature of information visualization is today fairly consensual in both professional and academic communities: data analysis, information design, storytelling, among other subjects, are common drivers in this field. The systematic study of this cross-fertilization, patent in the way the concept's definition varies according to the perspective being adopted, represents an important and needed addition to the critical mass of a relatively recent area of knowledge. The proposal of a single unified definition of information visualisation being beyond the scope of this paper, it instead summons and discusses its multiple viewpoints to help designing a postgraduate program on the topic, aiming to simultaneously start an open debate as its implementation phase goes on and new questions are subsequently raised.info:eu-repo/semantics/acceptedVersio

Crossref

Repositório Institucional do ISCTE-IUL

Evaluation of two interaction techniques for visualization of dynamic graphs

Author: B Bach
C Ware
C Ware
D Archambault
D Archambault
DA Keim
GD Rey
GG Van De Bunt
H Lam
HC Purchase
HC Purchase
I Boyandin
I Herman
JS Yi
JW Ahn
K Misue
M Farrugia
M McGuffin
M Smuc
MK Coleman
R Spence
RA Becker
S Ghani
T von Landesberger
U Brandes
W Aigner
WA Pike
Publication venue
Publication date: 01/01/2016
Field of study

Several techniques for visualization of dynamic graphs are based on different spatial arrangements of a temporal sequence of node-link diagrams. Many studies in the literature have investigated the importance of maintaining the user's mental map across this temporal sequence, but usually each layout is considered as a static graph drawing and the effect of user interaction is disregarded. We conducted a task-based controlled experiment to assess the effectiveness of two basic interaction techniques: the adjustment of the layout stability and the highlighting of adjacent nodes and edges. We found that generally both interaction techniques increase accuracy, sometimes at the cost of longer completion times, and that the highlighting outclasses the stability adjustment for many tasks except the most complex ones.Comment: Appears in the Proceedings of the 24th International Symposium on Graph Drawing and Network Visualization (GD 2016

arXiv.org e-Print Archive

Crossref

Graph Drawing E-print Archive

Goal-Based Selection of Visual Representations for Big Data Analytics

Author: A Abela
A Kleppe
AS Dadzie
D Mindolin
DA Keim
J Bertin
J Chandra
K Börner
N Kano
O Peña
R Marty
S Few
SS Stevens
Publication venue: place:Heidelberg
Publication date: 01/01/2017
Field of study

The H2020 TOREADOR Project adopts a model-driven architecture to streamline big data analytics and make it widely available to companies as a service. Our work in this context focuses on visualization, in particular on how to automate the translation of the visualization objectives declared by the user into a suitable visualization type. To this end we first define a visualization context based on seven prioritizable coordinates for assessing the user's objectives and describing the data to be visualized; then we propose a skyline-based technique for automatically translating a visualization context into a set of suitable visualization types. Finally, we evaluate our approach on a real use case excerpted from the pilot applications of TOREADOR

Crossref

ZENODO

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Perceiving Patterns in Parallel Coordinates: Determining Thresholds for Identification of Relationships

Author: Forsell C
Hinterberger H.
Keim DA
Kowler E.
Leek MR
Rose D.
Wegman EJ
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref